11 research outputs found
Recommended from our members
Few-Shot Natural Language Processing by Meta-Learning Without Labeled Data
Humans show a remarkable capability to accurately solve a wide range of problems efficiently -- utilizing a limited amount of computation and experience. Deep learning models, by stark contrast, can be trained to be highly accurate on a narrow task while being highly inefficient in terms of the amount of compute and data required to reach that accuracy. Within natural language processing (NLP), recent breakthroughs in unsupervised pretraining have enabled reusable models that can be applied to many NLP tasks, however, learning of new tasks is still inefficient. This has led to research on few-shot learning, where the goal is to generalize to new tasks with very few labeled instances. Meta-learning, or learning to learn, treats the learning process itself as a learning problem from data with the goal of learning systems that can generalize to new tasks efficiently. This has the potential to produce few-shot learners that can accurately solve a wide range of new tasks. However, meta-learning requires a distribution over tasks with relevant labeled data that can be difficult to obtain, severely limiting the practical utility of meta-learning methods. In this dissertation, we develop methods to enable large-scale meta-learning from unlabeled text data and improve the few-shot generalization ability of NLP models.
We contribute methods that propose tasks synthetically created from unlabeled text, allowing for a large task distribution for meta-learning. This leads to rapid learning of new tasks by meta-learning from millions of self-supervised tasks and minimizes the train-test mismatch in few-shot learning by optimizing the pre-training directly for future fine-tuning with a few examples. Since real-world applications of NLP require learning diverse tasks with different numbers of classes, we first introduce an optimization-based meta-learning method that can learn from multiple NLP classification tasks with any number of classes. We then leverage the proposed self-supervised approach to create meta-training tasks, with a diverse number of classes, and meta-train models for few-shot learning using this task distribution. This leads to better representation learning, learning key hyper-parameters like learning rates, can be combined with supervised tasks to regularize supervised meta-learning, and leads to accurate few-shot learning on a diverse set of NLP classification tasks. We further explore the space of self-supervised tasks for meta-learning by considering important aspects like task diversity, difficulty, type, domain, and curriculum, and investigate how they affect meta-learning performance. Our analysis shows that all these factors meaningfully alter the task distribution, some inducing significant improvements in downstream few-shot accuracy of the meta-learned models.
Our findings yield accurate and efficient meta-learning methods that improve few-shot generalization to diverse tasks and should enable many future applications of meta-learning in NLP, such as hyper-parameter optimization, continual learning, efficient learning, learning in low-resource languages, and more
A Moment in the Sun: Solar Nowcasting from Multispectral Satellite Data using Self-Supervised Learning
ABSTRACT
Solar energy is now the cheapest form of electricity in history. Unfortunately,
signi.cantly increasing the electric grid’s fraction of
solar energy remains challenging due to its variability, which makes
balancing electricity’s supply and demand more di.cult. While
thermal generators’ ramp rate—the maximum rate at which they
can change their energy generation—is .nite, solar energy’s ramp
rate is essentially in.nite. Thus, accurate near-term solar forecasting,
or nowcasting, is important to provide advance warnings to
adjust thermal generator output in response to variations in solar
generation to ensure a balanced supply and demand. To address the
problem, this paper develops a general model for solar nowcasting
from abundant and readily available multispectral satellite data
using self-supervised learning.
Speci.cally, we develop deep auto-regressive models using convolutional
neural networks (CNN) and long short-term memory
networks (LSTM) that are globally trained across multiple locations
to predict raw future observations of the spatio-temporal spectral
data collected by the recently launched GOES-R series of satellites.
Our model estimates a location’s near-term future solar irradiance
based on satellite observations, which we feed to a regression model
trained on smaller site-speci.c solar data to provide near-term solar
photovoltaic (PV) forecasts that account for site-speci.c characteristics.
We evaluate our approach for di.erent coverage areas and
forecast horizons across 25 solar sites and show that it yields errors
close to that of a model using ground-truth observations
Simultaneously Linking Entities and Extracting Relations from Biomedical Text Without Mention-level Supervision
Understanding the meaning of text often involves reasoning about entities and
their relationships. This requires identifying textual mentions of entities,
linking them to a canonical concept, and discerning their relationships. These
tasks are nearly always viewed as separate components within a pipeline, each
requiring a distinct model and training data. While relation extraction can
often be trained with readily available weak or distant supervision, entity
linkers typically require expensive mention-level supervision -- which is not
available in many domains. Instead, we propose a model which is trained to
simultaneously produce entity linking and relation decisions while requiring no
mention-level annotations. This approach avoids cascading errors that arise
from pipelined methods and more accurately predicts entity relationships from
text. We show that our model outperforms a state-of-the art entity linking and
relation extraction pipeline on two biomedical datasets and can drastically
improve the overall recall of the system.Comment: Accepted in AAAI 202
Relating Romanized Comments to News Articles by Inferring Multi-Glyphic Topical Correspondence
Commenting is a popular facility provided by news sites. Analyzing such user-generated content has recently attracted research interest. However, in multilingual societies such as India, analyzing such user-generated content is hard due to several reasons: (1) There are more than 20 official languages but linguistic resources are available mainly for Hindi. It is observed that people frequently use romanized text as it is easy and quick using an English keyboard, resulting in multi-glyphic comments, where the texts are in the same language but in different scripts. Such romanized texts are almost unexplored in machine learning so far. (2) In many cases, comments are made on a specific part of the article rather than the topic of the entire article. Off-the-shelf methods such as correspondence LDA are insufficient to model such relationships between articles and comments. In this paper, we extend the notion of correspondence to model multi-lingual, multi-script, and inter-lingual topics in a unified probabilistic model called the Multi-glyphic Correspondence Topic Model (MCTM). Using several metrics, we verify our approach and show that it improves over the state-of-the-art